/home/runner/work/strom/strom/src/strom/strom.py:837: FutureWarning:
Setting an item of incompatible dtype is deprecated and will raise in a future error of pandas. Value '['2' '3' '4' '5' '6' '7' '2' '3' '4' '5' '6' '7' '2' '3' '4' '5' '6' '7'
'2' '3' '4' '5' '6' '7' '2' '3' '4' '5' '6' '7' '2' '3' '4' '5' '6' '7'
'2' '3' '4' '5' '6' '7' '2' '3' '4' '5' '6' '7' '2' '3' '4' '5' '6' '7'
'2' '3' '4' '5' '6' '7' '2' '3' '4' '5' '6' '7' '2' '3' '4' '5' '6' '7'
'2' '3' '4' '5' '6' '7' '2' '3' '4' '5' '6' '7' '2' '3' '4' '5' '6' '7'
'2' '3' '4' '5' '6' '7' '2' '3' '4' '5' '6' '7' '2' '3' '4' '5' '6' '7'
'2' '3' '4' '5' '6' '7' '2' '3' '4' '5' '6' '7' '2' '3' '4' '5' '6' '7'
'2' '3' '4' '5' '6' '7' '2' '3' '4' '5' '6' '7' '2' '3' '4' '5' '6' '7'
'2' '3' '4' '5' '6' '7' '2' '3' '4' '5' '6' '7' '2' '3' '4' '5' '6' '7'
'2' '3' '4' '5' '6' '7' '2' '3' '4' '5' '6' '7' '2' '3' '4' '5' '6' '7'
'2' '3' '4' '5' '6' '7' '2' '3' '4' '5' '6' '7' '2' '3' '4' '5' '6' '7'
'2' '3' '4' '5' '6' '7' '2' '3' '4' '5' '6' '7' '2' '3' '4' '5' '6' '7'
'2' '3' '4' '5' '6' '7' '2' '3' '4' '5' '6' '7' '2' '3' '4' '5' '6' '7'
'2' '3' '4' '5' '6' '7' '2' '3' '4' '5' '6' '7' '2' '3' '4' '5' '6' '7'
'2' '3' '4' '5' '6' '7' '2' '3' '4' '5' '6' '7' '2' '3' '4' '5' '6' '7'
'2' '3' '4' '5' '6' '7' '2' '3' '4' '5' '6' '7' '2' '3' '4' '5' '6' '7'
'2' '3' '4' '5' '6' '7' '2' '3' '4' '5' '6' '7' '2' '3' '4' '5' '6' '7'
'2' '3' '4' '5' '6' '7' '2' '3' '4' '5' '6' '7' '2' '3' '4' '5' '6' '7'
'2' '3' '4' '5' '6' '7' '2' '3' '4' '5' '6' '7' '2' '3' '4' '5' '6' '7'
'2' '3' '4' '5' '6' '7' '2' '3' '4' '5' '6' '7' '2' '3' '4' '5' '6' '7'
'2' '3' '4' '5' '6' '7' '2' '3' '4' '5' '6' '7' '2' '3' '4' '5' '6' '7'
'2' '3' '4' '5' '6' '7' '2' '3' '4' '5' '6' '7' '2' '3' '4' '5' '6' '7'
'2' '3' '4' '5' '6' '7' '2' '3' '4' '5' '6' '7' '2' '3' '4' '5' '6' '7'
'2' '3' '4' '5' '6' '7' '2' '3' '4' '5' '6' '7' '2' '3' '4' '5' '6' '7'
'2' '3' '4' '5' '6' '7' '2' '3' '4' '5' '6' '7' '2' '3' '4' '5' '6' '7'
'2' '3' '4' '5' '6' '7' '2' '3' '4' '5' '6' '7' '2' '3' '4' '5' '6' '7'
'2' '3' '4' '5' '6' '7' '2' '3' '4' '5' '6' '7' '2' '3' '4' '5' '6' '7'
'2' '3' '4' '5' '6' '7' '2' '3' '4' '5' '6' '7' '2' '3' '4' '5' '6' '7'
'2' '3' '4' '5' '6' '7' '2' '3' '4' '5' '6' '7' '2' '3' '4' '5' '6' '7'
'2' '3' '4' '5' '6' '7' '2' '3' '4' '5' '6' '7' '2' '3' '4' '5' '6' '7']' has dtype incompatible with int64, please explicitly cast to a compatible dtype first.
Fine tuning the polynomial Model
Let’s address here a couple of the to-dos from the previous pages. Namely, let’s check the degree of the polynomial using a grid search and checkout other climatic variables, playing a bit with variable selection.
Tune polynomial degree
First, let’s fine-tune the degree of the polynomial. From the initial scatter plots, it seems that a 2 or 3 degree polynomial would be good. But let’s throw a grid search on that.
♻️ stepit 'grid_search_pipe': is up-to-date. Using cached result for `strom.modelling.grid_search_pipe()` 2025-11-22 16:19:31
It seems degree 4 would be best. Of course there is a a small variation according to the metric/scorer used, but overall, I think 4 seems best. The differences are rather small, but there is still this tension between MAE and RSME. A model that better fit on the warmer season and therefore has smaller small errors and a small MAE, tend to have a few very large errors, affecting the RSME. And the other way around.
Variable selection
Now let’s try here a brute-force approach to variable selection. Thereby, taking a not-so-thoughtful-but-quick-and-effective approach to trying out other climatic variables and checking the associations with relative humidity and other variables, to see if there is indeed a signal or just noise in some of them like humidity and so on.
Just let it crunch through a bunch of variable combinations. There will be many non-sensical or irrelevant combinations. But it’s just fast to write and the machine will have to work, me not so much.
From topic knowledge and the first correlations observed, we would expect mostly temprature to play a key role in the model. Yet, other variables such a humidity, pressure or condensation point could also be relevant. So let’s throw all that into a grid search and see what it spits out of it.
tt: Temperatur der Luft in 2m Hoehe °Crf_tu: relative Feuchte %td: Taupunktstemperatur °Cvp_std: berechnete Stundenwerte des Dampfdruckes hpatf_std: berechnete Stundenwerte der Feuchttemperatur °Cp_std: Stundenwerte Luftdruck hpa
♻️ stepit 'grid_search_pipe': is up-to-date. Using cached result for `strom.modelling.grid_search_pipe()` 2025-11-22 16:19:32
{'vars__columns': ['tt_tu_mean', 'vp_std_mean', 'tf_std_mean']}
Well humidity and other climatic variables can only improve the model marginally. A proper mediation analysis would still be in order, but this at least shed some light on it. Interestingly, some models without temperature but the set of other climatic variables almost equal the performance of the best model with temperature. Overall, it seems that at least temperature, humidity and pressure should be considered. Yet, the tend to be collineal and thus not really be able to used them all just like that in this kind of model.
One last brute-force approach for today and let automatically choose the best model
Best parameters
{'polynomial__degree': 4, 'vars__columns': ['tf_std_mean']}
Pipeline(steps=[('vars', ColumnSelector(columns=['tf_std_mean'])),
('polynomial', PolynomialFeatures(degree=4)),
('model', LinearRegression())])In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
Parameters
| steps | [('vars', ...), ('polynomial', ...), ...] | |
| transform_input | None | |
| memory | None | |
| verbose | False |
Parameters
| columns | ['tf_std_mean'] |
Parameters
| degree | 4 | |
| interaction_only | False | |
| include_bias | True | |
| order | 'C' |
Parameters
| fit_intercept | True | |
| copy_X | True | |
| tol | 1e-06 | |
| n_jobs | None | |
| positive | False |
/home/runner/work/strom/strom/src/strom/strom.py:837: FutureWarning:
Setting an item of incompatible dtype is deprecated and will raise in a future error of pandas. Value '['3' '3' '3' ... '5' '5' '5']' has dtype incompatible with int64, please explicitly cast to a compatible dtype first.
Loading ITables v2.5.2 from the init_notebook_mode cell...
(need help?) |
Loading ITables v2.5.2 from the init_notebook_mode cell...
(need help?) |
Assessment of the best model in that brute force approach
♻️ stepit 'poly': is up-to-date. Using cached result for `strom.modelling.assess_model()` 2025-11-22 16:19:33
Metrics

Loading ITables v2.5.2 from the init_notebook_mode cell...
(need help?) |
Scatter plot matrix
Observed vs. Predicted and Residuals vs. Predicted
Check for …
check the residuals to assess the goodness of fit.
- white noise or is there a pattern?
- heteroscedasticity?
- non-linearity?
Normality of Residuals:
Check for …
- Are residuals normally distributed?




Leverage
Scale-Location plot


Residuals Autocorrelation Plot


Residuals vs Time
Compare models
So let’s compare the models with and without humidity.
Cross-validation messages
♻️ stepit 'cross_validate_pipe': is up-to-date. Using cached result for `strom.modelling.cross_validate_pipe()` 2025-11-22 16:19:37 ♻️ stepit 'cross_validate_pipe': is up-to-date. Using cached result for `strom.modelling.cross_validate_pipe()` 2025-11-22 16:19:37 ♻️ stepit 'cross_validate_pipe': is up-to-date. Using cached result for `strom.modelling.cross_validate_pipe()` 2025-11-22 16:19:37
Metrics
Single split
Metrics based on the test set of the single split
Cross validation
Predictions, residuals, observed
next
Time vs. Predicted and Observed
Time vs. Residuals
Model details
Pipeline(steps=[('vars', ColumnSelector(columns=['tt_tu_mean', 'rf_tu_mean'])),
('polynomial', PolynomialFeatures()),
('model', LinearRegression())])In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
Parameters
| steps | [('vars', ...), ('polynomial', ...), ...] | |
| transform_input | None | |
| memory | None | |
| verbose | False |
Parameters
| columns | ['tt_tu_mean', 'rf_tu_mean'] |
Parameters
| degree | 2 | |
| interaction_only | False | |
| include_bias | True | |
| order | 'C' |
Parameters
| fit_intercept | True | |
| copy_X | True | |
| tol | 1e-06 | |
| n_jobs | None | |
| positive | False |
Pipeline(steps=[('vars', ColumnSelector(columns=['tt_tu_mean', 'tf_std_mean'])),
('polynomial', PolynomialFeatures(degree=4)),
('model', LinearRegression())])In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
Parameters
| steps | [('vars', ...), ('polynomial', ...), ...] | |
| transform_input | None | |
| memory | None | |
| verbose | False |
Parameters
| columns | ['tt_tu_mean', 'tf_std_mean'] |
Parameters
| degree | 4 | |
| interaction_only | False | |
| include_bias | True | |
| order | 'C' |
Parameters
| fit_intercept | True | |
| copy_X | True | |
| tol | 1e-06 | |
| n_jobs | None | |
| positive | False |
Pipeline(steps=[('vars', ColumnSelector(columns=['tf_std_mean'])),
('polynomial', PolynomialFeatures(degree=4)),
('model', LinearRegression())])In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
Parameters
| steps | [('vars', ...), ('polynomial', ...), ...] | |
| transform_input | None | |
| memory | None | |
| verbose | False |
Parameters
| columns | ['tf_std_mean'] |
Parameters
| degree | 4 | |
| interaction_only | False | |
| include_bias | True | |
| order | 'C' |
Parameters
| fit_intercept | True | |
| copy_X | True | |
| tol | 1e-06 | |
| n_jobs | None | |
| positive | False |
A nice improvement over the baseline model, and the model without humidity is slightly better across all metrics (a rather small improvement, but across all metrics).